Enhanced Bleedthrough Correction for Early Music Documents with Recto-Verso Registration

نویسندگان

  • John Ashley Burgoyne
  • Johanna Devaney
  • Laurent Pugin
  • Ichiro Fujinaga
چکیده

Ink bleedthrough is common problem in early music documents. Even when such bleedthrough does not pose problems for human perception, it can inhibit the performance of optical music recognition (OMR). One way to reduce the amount of bleedthrough is to take into account what is printed on the reverse of the page. In order to do so, the reverse of the page must be registered to match the front of the page on a pixel-by-pixel basis. This paper describes our approach to registering scanned early music scores as well as our modifications to two robust binarization approaches to take into account bleedthrough and the information available from the registration process. We determined that although the information from registration itself often makes little difference in recognition performance, other modifications to binarization algorithms for correcting bleedthrough can yield dramatic increases in OMR results. 1 MOTIVATION AND BACKGROUND 1.1 Fostering Interdisciplinary Research with OMR “We stand at a moment of opportunity,” opened Nicholas Cook at his invited talk for ISMIR 2005 in London. The opportunity is for historical musicologists and music information scientists to work together and revitalize the subdiscipline of computer-assisted empirical musicology [6]. This subdiscipline began in the 1960s [15], and although it has developed into a thriving discipline in several regions, it is largely moribund in North America, where a significant amount of other musicological research takes place. While Cook assigned musicologists a great deal of the responsibility for realizing this moment of interdisciplinary opportunity, he challenged researchers in music information retrieval to create large databases of the “highly reduced data”—e.g., scores—upon which musicological research relies. Such electronic databases are especially important for those older documents that are available in only a limited number of locations and for which archivists often restrict physical access, making it difficult to engage in large-scale comparative research. Entering musical sources into such databases by hand is highly labor-intensive, which renders music digitization projects prohibitively costly for most institutions [3]. Optical music recognition (OMR), the musical analog to optical character recognition (OCR), is the most practical means by which to create such databases. The potential for optical recognition to transform research approaches has already been demonstrated by the recent explosion in the number of searchable electronic texts available for older books and journal articles, (e.g., JSTOR 1 ). When treating historical documents, however, OMR and OCR systems struggle with various types of document degradation, including ink bleedthrough from the reverse side of the page [1]. Because these systems rely on the ability to distinguish foreground (ink) from background (paper), the darker the bleedthrough, the more likely it is that the bleedthrough will be classified as foreground and thereby corrupt the recognition process. 1.2 Binarization and Bleedthrough Binarization is the common name for separating an image into its foreground and background. It is notoriously difficult to evaluate, leading most authors to resort to coarse subjective distinctions such as “better,” “same,” or “worse,” e.g., [12]. As binarization is typically a preprocessing step in an automated image-processing pipeline with some other goal, e.g., OMR, one can use the evaluation metric for the ultimate goal, e.g., minimizing the amount of time necessary for a human editor to correct recognition errors, to evaluate the quality of the binarization algorithm. This paper uses a similar method to that described in our recently-published survey of binarization algorithms for music documents [4] to evaluate extensions to the algorithms we found to be bestperforming. Although there has been some previous work on binarization in the presence of bleedthrough that focuses specifically on historical documents [9, 10], a larger body of work 1 http://www.jstor.org/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reduction of Bleed-through in Scanned Manuscript Documents

Many old manuscript documents were written on both sides of the paper, and the bleed-through from one side of the document to the other increases the difficulty in reading or deciphering the information on the page. This paper presents techniques for reducing such bleed-through distortion using techniques of digital image processing. Both sides of the document are scanned, maintaining full spat...

متن کامل

Restoration of recto-verso colour documents using correlated component analysis

In this article, we consider the problem of removing see-through interferences from pairs of recto–verso documents acquired either in grayscale or RGB modality. The see-through effect is a typical degradation of historical and archival documents or manuscripts, and is caused by transparency or seeping of ink from the reverse side of the page. We formulate the problem as one of separating two in...

متن کامل

Reflectance and transmittance model for recto-verso halftone prints.

We propose a spectral prediction model for predicting the reflectance and transmittance of recto-verso halftone prints. A recto-verso halftone print is modeled as a diffusing substrate surrounded by two inked interfaces in contact with air (or with another medium). The interaction of light with the print comprises three components: (a) the attenuation of the incident light penetrating the print...

متن کامل

Yule-Nielsen based recto-verso color halftone transmittance prediction model.

The transmittance spectrum of halftone prints on paper is predicted thanks to a model inspired by the Yule-Nielsen modified spectral Neugebauer model used for reflectance predictions. This model is well adapted for strongly scattering printing supports and applicable to recto-verso prints. Model parameters are obtained by a few transmittance measurements of calibration patches printed on one si...

متن کامل

A Ground Truth Bleed-Through Document Image Database

This paper introduces a new database of 25 recto/verso image pairs from documents suffering from bleed-through degradation, together with manually created foreground text masks. The structure and creation of the database is described, and three bleed-through restoration methods are compared in two ways; visually, and quantitatively using the ground truth masks.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008